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ABSTRACT 

ComiR is a web tool for combinatorial microRNA 
(miRNA) target prediction. Given an messenger 
RNA (mRNA) in human, mouse, fly or worm 
genomes, ComiR computes the potential of being 
targeted by a set of miRNAs, each of which can 
have zero, one or more targets on its 3 untranslated 
region. In determining the regulatory potential of 
an mRNA from a set of miRNAs, ComiR uses 
user-provided miRNA expression levels in a combin- 
ation of appropriate thermodynamic modeling 
and machine learning techniques to make more 
accurate predictions. For each gene, ComiR 
returns the probability of being a functional target 
of a set of miRNAs, which depends on the relative 
miRNA expression levels. The tool provides a user- 
friendly interface to input a miRNA expression table 
containing many sample information and filter out 
the most relevant miRNAs. ComiR results can be 
downloaded or visualized on a table, which can 
then be used to select the most relevant targets 
and to compare the results obtained with different 
miRNA expression input. ComiR is freely available 
for academic use at http://www.benoslab.pitt.edu/ 
comir/. 



INTRODUCTION 

Micro RNAs (miRNA) are a class of short (18-25 nucleo- 
tide) non-coding RNAs that regulate gene expression post- 
transcriptionally. Their regulatory activity depends heavily 
on the recognition of binding sites located mainly on the 3'- 
untranslated regions (3'UTRs) of target messenger RNA 
(mRNA) (1). Existing computational tools predict the 
miRNA targets by considering site-specific factors of 
target sites [see (2-4) for some examples]. These tools, 
tested against the available experimentally validated 



miRNA-target pairs each, were able to predict only a 
portion of them, while their target overlap remains poor 
(5,6). However, the efficiency of miRNA-mediated regula- 
tion can be affected by multiple system-wide factors (7,8), 
such as miRNA expression, combinatorial binding of 
multiple miRNA targets or the expression of competitive 
endogenous RNA. ComiR addresses two of them, namely 
how miRNA expression affects binding and how combina- 
torial miRNA binding affects mRNA regulation. We note 
that some recent algorithms also consider these factors 
when they decide on targets: PicTar (9) is a computa- 
tional method for identifying common targets of 
miRNAs, but it does not consider the miRNA expression 
in determining the relative binding; GenMir++ (10), 
TargetMiner (11) and TaLasso (12) evaluate the relevance 
of miRNA: mRNA interactions by analyzing expression 
profiles and prior targeting information, but they require 
paired mRNA and miRNA expression profiles over many 
samples. Furthermore, none of them incorporates miRNA 
expression in the binding model. 

Recent experimental approaches, like those based on 
immunoprecipitation (IP) of miRISC proteins (RNA- 
induced silencing complex) (13,14), generate validated 
miRNA:mRNA target data in a high-throughput fashion. 
These data are useful to unravel the systemic aspects of 
miRNA targeting. Indeed, miRNA expression profiles are 
now known in several tissues and cell lines (15) and as a 
result it becomes more important to know the targets of a 
set of miRNA genes instead of individual miRNAs. Of 
course, one might infer the targets of a set from the 
targets of its members, but as we have recently shown, a 
naive inference model does not produce good results (16). 

To this end, we developed ComiR (http://www. 
benoslab.pitt.edu/comir/), an algorithm that predicts the 
regulatory potential of a set of miRNA genes toward 
a given mRNA. ComiR initially integrates miRNA ex- 
pression in a scoring scheme for each single target on 
this mRNA, and then it additively combines the 
weighted scores of the single targets. ComiR uses four 
popular scoring schemes [the ones used by miRanda (4), 
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PITA (2), TargetScan (3) and mirSVR (17)] and at the 
final stage it combines all four integrative scorings into 
one using a support vector machine (SVM) trained on 
Drosophila AGOl IP data (18). The end result is the 
SVM probability, which represents the likelihood that 
this mRNA is regulated by the set of miRNA genes. 
Our results (16) have shown that ComiR presents a 
significant improvement over standard miRNA prediction 
algorithms adapted for combinatorial miRNA targeting. 

TOOL DESCRIPTION 

General framework 

ComiR is a tool designed to predict the targets of a set of 
miRNAs by considering the miRNA expression levels as 
an integral part of the target decision process. The user 
has the option to add his/her own target set of genes (e.g. 
3'UTRs of mRNAs, LINCs) to the default list of genes 
considered for the available species {Homo sapiens, Mus 
musculus, Drosophila melanogaster and Caenorhabditis 
elegans) Below, without loss of generality, references to 
'target sequences' or 'mRNAs' should include the 
3'UTRs of the mRNAs together with any other custom 
target sequence set the user submits. 

For each mRNA, ComiR determines the potential of 
being targeted by the miRNA set. The targeting potential 
is calculated in two steps. In the first step, four different 
methods are used that complement each other to some 
extent. One method is an adaptation of miRanda (4), in 
which the probability of an mRNA:miRNA binding is 
calculated based on the Fermi-Dirac equation (16,19), 
which takes into consideration the miRNA expression, 
and the individual probabilities are summed over all 
targets on this mRNA of all miRNAs in the miRNA 
set. The second method is a similar adaptation of PITA 
(2), in which the Fermi-Dirac equation also substitutes the 
standard energies. As the third method, we use the 
TargetScan (3) scoring (without conservation) weighted 
by each miRNA expression level. Finally, mirSVR (20) 
is also used, but its scores are also combined after they 
have been weighted by miRNA expression. In the second 
step, we combine the predictions of the above four 
methods with a SVM, which is trained on a high-quality 
data set derived from AGOl IP experiments in 
D. melanogaster. It has been shown (16) that this model 
can be efficiently applied to predict miRNA targets in 
other species, after a normalization step. 

ComiR's output is a ranked list of genes based on the 
target probability computed through the SVM model. 
Because the score computation includes the miRNAs ex- 
pression values, the same set of miRNAs might generate 
different ranking if different miRNA expression levels are 
used. ComiR's pipeline is shown in Figure 1. 

Databases and pre-computed binding scores 

The ComiR help page provides links to the FASTA files 
with the mRNA and miRNA sequences used to run 
ComiR. We restricted our analysis on the mRNA 
3'UTR sequences that we downloaded from the 
ENSEMBL Web site (as in 20 September 2012). When 



more than one 3'UTR sequence is associated to the 
same ENSEMBL gene ID, we use the longest sequence. 
miRNAs mature sequences were downloaded from the 
miRBase Web site (Release 19). An internal ID conversion 
tool is used to convert the old miRNA IDs with the 
updated ones. Regardless, we suggest the users to refer 
to the latest version of miRBase official ID. 

miRNA binding scores have been pre-computed for 
every 3'UTR gene sequence according to the original 
miRanda, PITA, TargetScan and mirSVR scoring systems 
and they are stored on our server for a fast access. PITA 
and TargetScan ran with their default parameters. 
The miRanda' s parameters have been relaxed to include 
in the predictions the weaker binding sites (score = 0, 
energy = 0). Finally, we used the mirSVR prediction files 
labeled as 'Good mirSVR score, Conserved miRNA'. The 
single target scores we use as basis to calculate the overall 
regulatory potential of each miRNA:mRNA pair are the 
following, (i) miRanda-type scoring: single target score is 
the binding energy of each binding site, (ii) PITA-type 
scoring: single target score is the difference between the 
free energy gained by the binding of the miRNA to the 
target and the free energy lost by unpairing the target-site 
nucleotides (AAG) of each binding site, (iii) TargetScan- 
type scoring: single target score is the number of detected 
binding sites (no phylogenetic conservation features are 
used), (iv) mirSVR scoring: single target score is mirSVR 
score of the predicted target sites. 

Input formats 

ComiR accepts miRNA expression levels input as comma- 
separated field table format. The first row contains the 
sample IDs where miRNA expression levels are detected 
and the first column contains miRNA IDs. If miRNA 
expression levels are expressed in log scale (e.g. from 
microarray measurements), ComiR will internally 
convert them into the linear scale, provided the user 
checks the corresponding checkbox. We also allow users 
to submit a list of miRNA IDs without expression levels, 
in which case ComiR assumes that all miRNAs are ex- 
pressed at the same level. Custom mRNA sequences can 
be inserted with FASTA file format. 

Data pre-processing 

The input miRNA IDs are first converted into the latest 
miRBase IDs. The miRNAs that are not found in the 
stored data set related to the selected species are 
removed from the analysis and the user is notified. If 
miRNA expression levels are provided, the miRNA con- 
centrations are computed as the fraction of expression 
level over the total amount, for each sample. It is 
possible to focus the analysis on the top expressed 
miRNA, by selecting the percentage of total miRNA 
abundance to be covered in the 'More Options' panel, 
and/or a threshold for the minimum expression level to 
be considered. It is also possible to input a list of miRNAs 
to be excluded from the filtering, even if their expression 
level does not reach the desired threshold. 

The list of miRNAs can be treated in two different 
ways, i.e. as a set of miRNAs or as single miRNAs. 
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Figure 1. Overview of the integrated analysis in ComiR. 



One of the two options can be selected in the 'More 
Options' panel. If the user wants the miRNAs in the list 
to be considered as a set, ComiR will compute one 
score for each gene. For multiple samples, ComiR will 
calculate one score per gene per sample. If the user 
wants each miRNA in the list to be considered inde- 
pendently, ComiR will compute one set of scores per 
miRNA, by using a concentration equal to 1 for each 
of them even if the miRNA input contains expression 
levels. 



If custom target sequences are provided, ComiR will 
run the publicly available scripts of miRanda, PITA and 
TargetScan tools, to obtain the initial binding energy and 
target prediction scores and these values will be trans- 
formed and combined as we describe above. 

Score combination and method integration 

As we mentioned above, ComiR uses four complementary 
methods to assess the binding potential of a miRNA to a 



W162 Nucleic Acids Research, 2013, Vol. 41, Web Server issue 



single target in each 3'UTR. One is based on simple 
binding energies, which we take from the miRanda 
scoring file. The second is based on binding energies 
after the mRNA secondary structure has been factored 
in. For this we use the PITA scoring file. The third is 
the number of string matches of the miRNA seed 
sequence, which we take from the TargetScan primary 
predictions. And the fourth method is the prediction 
score of mirSVR. ComiR transforms these primary 
scores of individual targets using the miRNA expression 
and sums the transformed scores to account for multiple 
miRNA targets. The summing is a simple weighted sum in 
the case of TargetScan and mirSVR types of scores, 
whereas in the case of miRanda and PITA types we use 
the Fermi-Dirac equation. The details are provided in 
(16). Briefly, the Fermi-Dirac model is the following: 



i j 

where /x/ = RT\og([miRNAi\), N is the number of 
miRNAs in the input set, n ik is the number of binding 
sites predicted for the miRNAi:mRNA k pair and for 
each of them E ijk is the binding energy computed for the 
binding site j. For the seed matching scores and the 
mirSVR scores we use a simpler scoring scheme 
(weighted sum): 

S^SUM = j^ [miRNAi]s . k 

i 

At the end of this step, each gene is associated with four 
scores, one for each primary method. These four scores 
are then used as input to the SVM model, which has been 
calculated on a D.melanogaster AGOl IP data set (16). To 
use the SVM to efficiently predict targets in other species, 
the algorithm includes a normalization step. Specifically, 
the input to the SVM is the ranking of the genes with 
respect to the computed combined scores. This expedient 
avoids the biases due to different prediction scores distri- 
butions in different species and to the variety of miRNAs 
the user may use as input. Finally, the SVM computed 
probability is chosen as the ComiR score referring to the 
probability of an mRNA to be regulated by the set of 
miRNAs with the specific expression levels. 

Submission and wait time 

Depending on the size of the input files and the chosen 
species, the analysis could take anywhere from seconds to 
up to an hour. As an example, for D.melanogaster predic- 
tions with 25 miRNAs, the computing time is roughly 
25 s, while the same number of miRNAs in H. sapiens 
predictions is processed in roughly 5min. The inclusion 
of custom mRNA sequences will increase the execution 
time on the server linearly with the number and the 
length of the inserted sequences. For instance, processing 
one custom sequence of 3000 bases and 25 miRNAs 
requires about 45 extra seconds. 

After submission, a summary page describing the input 
and the results of the data pre-processing is shown. If the 



input setting needs to be adjusted then a 'Go to Input 
settings' button leads back to the input page. Otherwise 
the 'Go to Results' button brings to an execution log page. 
This page will contain the results once the job finishes and 
the link to it can be saved so it can be visited later. 

ComiR output 

The user can decide to download the complete results in a 
table format or to browse and filter the results table in the 
web page. The downloadable results zip file contains the 
Ensembl ID and the Official Gene ID columns followed 
by the computed ComiR scores columns, one per each 
miRNA set present in the miRNA input file. If custom 
mRNA sequences have been inserted, the file will also 
contain a row for each inserted sequence. 

The results web page shows a table with the first three 
columns describing the genes followed by the computed 
ComiR score columns. The genes are identified by the 
Ensembl ID, Entrez ID and Official Gene Name. 
Ensembl IDs and Entrez IDs are linked to the correspond- 
ent external database. The user can sort the table rows by 
any header and in ascending or descending order. It is 
possible to modify the shown results table by removing 
all the genes not predicted as targets by any of the input 
miRNA sets, by choosing the ComiR score threshold 
above which the genes has to be considered as targets. 
Finally, if more than one miRNA set is inserted, it is 
possible to visualize an extra column containing the dif- 
ference between the ComiR scores obtained with any two 
miRNA sets. Moreover, if enough data are available, it is 
possible to perform a Wilcoxon analysis comparing the 
ranks of the genes in ComiR predictions of two groups 
of at least three miRNA sets. In fact, it is possible to select 
the composition of the two groups of samples to be 
compared and a new column containing the Wilcoxon- 
test P-value will be shown. In this case, the score differ- 
ence column will contain the difference between the 
average ComiR scores in the two groups. The genes with 
significant P-values are what we call the 'differentially pre- 
dicted' genes. The results visualization and the differen- 
tially predicted genes analysis can also be performed by 
considering the rank of the genes with respect of the 
associated ComiR score instead of the ComiR score itself. 

CASE STUDY SCENARIOS 

The primary goal of ComiR is to rank the targets of a set 
of miRNAs with known expression levels in the sample of 
interest. In (16) we described some applications of ComiR 
in this basic direction, where we predicted the targets of 
D.melanogaster, C.elegans and H. sapiens miRNA sets and 
showed that ComiR outperforms existing target predic- 
tion tools. Moreover, ComiR can be applied to perform 
more elaborate analysis, and some examples are described 
above. 

Single-nucleotide polymorphisms effects on 
miRNA binding 

Single-nucleotide polymorphisms (SNP) can be located in 
miRNA binding regions and consequently they could 
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affect gene expression. ComiR can be used to evaluate the 
SNP's effect on miRNA regulation by inserting as input 
the wild type and the SNP mRNA sequence and ranking 
the evaluated ComiR scores. As an example, in (16) we 
evaluated the effect of 15 high-frequency SNPs in ERoc 
pathway genes on the affinity of the affected genes with 
every known human miRNA. We found that rs 17737058, 
located in the 3'UTR of NCOA1 gene, causes the disrup- 
tion of the binding with the hsa-miR-488* miRNA. This 
effect has been experimentally validated, and we also 
found that this SNP is associated with decreased bone 
mineral density. 

Differentially predicted genes 

This analysis is thought to compare the target predictions 
obtained with different miRNA sets. It is, for instance, 
conceived for data sets containing replicates of miRNA 
expression levels data in different samples. The differen- 
tially predicted genes analysis can be applied as an alter- 
native strategy to predicting the targets of differentially 
expressed miRNAs. In fact, with the same list of 
miRNAs, ComiR generates different results, depending 
on the concentrations associated with the miRNAs. It is 
worth noticing that while predicting the targets of the dif- 
ferentially expressed miRNAs does not take into account 
the presence of all the other miRNAs that might prevail 
over them, ComiR can be used to predict the targets of the 
top-expressed miRNAs in each sample, and consequently 
used to search for which genes are highly predicted in one 
group of samples and not in the other (21). 

Single miRNA contribution 

ComiR algorithm incorporates the information about all 
the input miRNAs into a single score. If the user is inter- 
ested in determining which miRNAs contributed more to 
the formation of one specific gene's score, we suggest to 
run ComiR with the same miRNA set, but with the option 
'single miRNAs' activated. This analysis will run a 
separated ComiR analysis for each miRNA in the set. 
The higher is the contribution of a specific miRNA to 
the ComiR score associated to the entire miRNA set, 
the higher will be the ComiR score associated to the 
single miRNA. 



CONCLUSIONS 

In recent years, with the availability of high-throughput 
miRNA expression data, a new perspective for miRNA 
target prediction is starting to emerge. Many target pre- 
diction tools already exist, but none of them takes into 
account system-level factors, like the co-expression of 
several miRNAs with sample-specific expression levels. 
ComiR has been developed specifically to predict the 
targets of sets of miRNAs, and its user-friendly interface 
allows researchers to compare the results obtained with 
groups of them. A number of issues remain unsolved, 
however, such as the competition of miRNAs for the 
Ago proteins (22), or the extent of the co-operative 
effect of miRNAs. As more relevant data sets become 



available we will extend the miRNA binding models to 
incorporate these effects. 
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